Object detection through search with a foveated visual system
نویسندگان
چکیده
Humans and many other species sense visual information with varying spatial resolution across the visual field (foveated vision) and deploy eye movements to actively sample regions of interests in scenes. The advantage of such varying resolution architecture is a reduced computational, hence metabolic cost. But what are the performance costs of such processing strategy relative to a scheme that processes the visual field at high spatial resolution? Here we first focus on visual search and combine object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We develop a foveated object detector that processes the entire scene with varying resolution, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. We compared the foveated object detector against a non-foveated version of the same object detector which processes the entire image at homogeneous high spatial resolution. We evaluated the accuracy of the foveated and non-foveated object detectors identifying 20 different objects classes in scenes from a standard computer vision data set (the PASCAL VOC 2007 dataset). We show that the foveated object detector can approximate the performance of the object detector with homogeneous high spatial resolution processing while bringing significant computational cost savings. Additionally, we assessed the impact of foveation on the computation of bottom-up saliency. An implementation of a simple foveated bottom-up saliency model with eye movements showed agreement in the selection of top salient regions of scenes with those selected by a non-foveated high resolution saliency model. Together, our results might help explain the evolution of foveated visual systems with eye movements as a solution that preserves perceptual performance in visual search while resulting in computational and metabolic savings to the brain.
منابع مشابه
Can Peripheral Representations Improve Clutter Metrics on Complex Scenes?
Previous studies have proposed image-based clutter measures that correlate with human search times and/or eye movements. However, most models do not take into account the fact that the effects of clutter interact with the foveated nature of the human visual system: visual clutter further from the fovea has an increasing detrimental influence on perception. Here, we introduce a new foveated clut...
متن کاملVision-model-based image foveation and motion estimation
Eli Peli, MEMBER SPIE The Schepens Eye Research Institute 20 Staniford Street Boston, Massachusetts 02114-2500 E-mail: [email protected] Abstract. Foveated imaging systems applicable in various single-user displays mimic the visual system’s image structure, where resolution decreases gradually away from the fovea. The main benefit is the low average image resolution while maintaining h...
متن کاملFast Object Detection with Foveated Imaging and Virtual Saccades on Resource Limited Robots
This paper describes the use of foveated imaging and virtual saccades to identify visual objects using both colour and edge features. Vision processing is a resource hungry operation at the best of times. When the demands require robust, real-time performance with a limited embedded processor, the challenge is significant. Our domain of application is the RoboCup Standard Platform League soccer...
متن کاملOn the advantages of foveal mechanisms for active stereo systems in visual search tasks
In this work we study how information provided by foveated images sampled according to the log-polar transformation can be integrated over time in order to build accurate world representations and accomplish visual search tasks in an efficient manner. We focus on a specific visual information modality – depth – and on how to store it in a flexible memory structure. We propose a probabilistic ob...
متن کاملThe wisdom of crowds for visual search.
Decision-making accuracy typically increases through collective integration of people's judgments into group decisions, a phenomenon known as the wisdom of crowds. For simple perceptual laboratory tasks, classic signal detection theory specifies the upper limit for collective integration benefits obtained by weighted averaging of people's confidences, and simple majority voting can often approx...
متن کامل